Conditional Dependencies: A Principled Approach to Improving Data Quality
نویسندگان
چکیده
Real-life date is often dirty and costs billions of pounds to businesses worldwide each year. This paper presents a promising approach to improving data quality. It effectively detects and fixes inconsistencies in real-life data based on conditional dependencies, an extension of database dependencies by enforcing bindings of semantically related data values. It accurately identifies records from unreliable data sources by leveraging relative candidate keys, an extension of keys for relations by supporting similarity and matching operators across relations. In contrast to traditional dependencies that were developed for improving the quality of schema, the revised constraints are proposed to improve the quality of data. These constraints yield practical techniques for data repairing and record matching in a uniform framework.
منابع مشابه
Discovering Conditional Functional Dependencies to Detect Data Inconsistencies
Poor quality data is a growing and costly problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. In this paper, we present an approach that efficiently and robustly discovers conditional functional dependencies for detecting inconsistencies in data and hence improves data quality. We evaluate our approach empirically...
متن کاملMining Constant Conditional Functional Dependencies for Improving Data Quality
This paper applies the data mining techniques in the area of data cleaning as effective in discovering Constant Conditional Functional Dependencies(CCFDs) from relational databases . These CCFDs are used as business rules for context dependent data validations. Conditional Functional Dependencies(CFDs) are an extension of Functional dependencies(FDs) which captures the consistency of data by su...
متن کاملData-driven extensions to HMM statistical dependencies
In this paper, a new technique is introduced that relaxes the HMM conditional independence assumption in a principled way. Without increasing the number of states, the modeling power of an HMM is increased by including only those additional probabilistic dependencies (to the surrounding observation context) that are believed to be both relevant and discriminative. Conditional mutual information...
متن کاملSelf-Organizing Maps in data analysis - notes on overfitting and overinterpretation
The Self-Organizing Map, SOM, is a widely used tool in exploratory data analysis. Visual inspection of the SOM can be used to list potential dependencies between variables, that are then validated with more principled statistical methods. In this paper we discuss the use of the SOM in searc hing for dependencies in the data. We poin t out that simple use of the SOM may lead to excessive number ...
متن کاملSelf-Organizing Map in Data-Analysis - Notes on Overfitting and Overinterpretation
The Self-Organizing Map, SOM, is a widely used tool in exploratory data analysis. Visual inspection of the SOM can be used to list potential dependencies between variables, that are then validated with more principled statistical methods. In this paper we discuss the use of the SOM in searching for dependencies in the data. We point out that simple use of the SOM may lead to excessive number of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009